{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "038d3b26-49e3-4b1a-a1fa-de586f0586c9",
   "metadata": {},
   "source": [
    "# Optional Lab: Gradient Descent for Logistic Regression"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "235f7838-877e-4d98-96f3-f1d40a4e1c20",
   "metadata": {},
   "source": [
    "## Goals\n",
    "In this lab, you will:\n",
    "- update gradient descent for logistic regression.\n",
    "- explore gradient descent on a familiar data set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c8da956d-cc4e-4e5c-86a4-e8c3cfb8aff8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import copy, math\n",
    "import numpy as np\n",
    "%matplotlib widget\n",
    "import matplotlib.pyplot as plt\n",
    "from lab_utils_common import  dlc, plot_data, plt_tumor_data, sigmoid, compute_cost_logistic\n",
    "from plt_quad_logistic import plt_quad_logistic, plt_prob\n",
    "plt.style.use('./deeplearning.mplstyle')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6fde024-9a82-47bf-a48c-d5c67af69118",
   "metadata": {},
   "source": [
    "## Data set \n",
    "Let's start with the same two feature data set used in the decision boundary lab."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "62794deb-122c-4ba4-b04f-5b3bec3e8304",
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])\n",
    "y_train = np.array([0, 0, 0, 1, 1, 1])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bcc048b6-b74d-43d8-8f17-e083d33fdf23",
   "metadata": {},
   "source": [
    "As before, we'll use a helper function to plot this data. The data points with label $y=1$ are shown as red crosses, while the data points with label $y=0$ are shown as blue circles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37f852de-11c0-4e6f-bc07-663728144a9d",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig,ax = plt.subplots(1,1,figsize=(4,4))\n",
    "plot_data(X_train, y_train, ax)\n",
    "\n",
    "ax.axis([0, 4, 0, 3.5])\n",
    "ax.set_ylabel('$x_1$', fontsize=12)\n",
    "ax.set_xlabel('$x_0$', fontsize=12)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "449f22e4-eb90-4810-b0d1-c8ae7ecacba2",
   "metadata": {},
   "source": [
    "## Logistic Gradient Descent\n",
    "<img align=\"right\" src=\"./images/C1_W3_Logistic_gradient_descent.png\"     style=\" width:400px; padding: 10px; \" >\n",
    "\n",
    "Recall the gradient descent algorithm utilizes the gradient calculation:\n",
    "$$\\begin{align*}\n",
    "&\\text{repeat until convergence:} \\; \\lbrace \\\\\n",
    "&  \\; \\; \\;w_j = w_j -  \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{1}  \\; & \\text{for j := 0..n-1} \\\\ \n",
    "&  \\; \\; \\;  \\; \\;b = b -  \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\\\\n",
    "&\\rbrace\n",
    "\\end{align*}$$\n",
    "\n",
    "Where each iteration performs simultaneous updates on $w_j$ for all $j$, where\n",
    "$$\\begin{align*}\n",
    "\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}  &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \\tag{2} \\\\\n",
    "\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}  &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{3} \n",
    "\\end{align*}$$\n",
    "\n",
    "* m is the number of training examples in the data set      \n",
    "* $f_{\\mathbf{w},b}(x^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target\n",
    "* For a logistic regression model  \n",
    "    $z = \\mathbf{w} \\cdot \\mathbf{x} + b$  \n",
    "    $f_{\\mathbf{w},b}(x) = g(z)$  \n",
    "    where $g(z)$ is the sigmoid function:  \n",
    "    $g(z) = \\frac{1}{1+e^{-z}}$   \n",
    "    \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af9a2ac9-0ac2-4eb1-9c83-a2b434c9dcb8",
   "metadata": {},
   "source": [
    "### Gradient Descent Implementation\n",
    "The gradient descent algorithm implementation has two components: \n",
    "- The loop implementing equation (1) above. This is `gradient_descent` below and is generally provided to you in optional and practice labs.\n",
    "- The calculation of the current gradient, equations (2,3) above. This is `compute_gradient_logistic` below. You will be asked to implement this week's practice lab.\n",
    "\n",
    "#### Calculating the Gradient, Code Description\n",
    "Implements equation (2),(3) above for all $w_j$ and $b$.\n",
    "There are many ways to implement this. Outlined below is this:\n",
    "- initialize variables to accumulate `dj_dw` and `dj_db`\n",
    "- for each example\n",
    "    - calculate the error for that example $g(\\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b) - \\mathbf{y}^{(i)}$\n",
    "    - for each input value $x_{j}^{(i)}$ in this example,  \n",
    "        - multiply the error by the input  $x_{j}^{(i)}$, and add to the corresponding element of `dj_dw`. (equation 2 above)\n",
    "    - add the error to `dj_db` (equation 3 above)\n",
    "\n",
    "- divide `dj_db` and `dj_dw` by total number of examples (m)\n",
    "- note that $\\mathbf{x}^{(i)}$ in numpy `X[i,:]` or `X[i]`  and $x_{j}^{(i)}$ is `X[i,j]`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2c90c465-5b12-46cc-aa04-41b457c24bed",
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_gradient_logistic(X, y, w, b): \n",
    "    \"\"\"\n",
    "    Computes the gradient for linear regression \n",
    " \n",
    "    Args:\n",
    "      X (ndarray (m,n): Data, m examples with n features\n",
    "      y (ndarray (m,)): target values\n",
    "      w (ndarray (n,)): model parameters  \n",
    "      b (scalar)      : model parameter\n",
    "    Returns\n",
    "      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. \n",
    "      dj_db (scalar)      : The gradient of the cost w.r.t. the parameter b. \n",
    "    \"\"\"\n",
    "    m,n = X.shape\n",
    "    dj_dw = np.zeros((n,))                           #(n,)\n",
    "    dj_db = 0.\n",
    "\n",
    "    for i in range(m):\n",
    "        f_wb_i = sigmoid(np.dot(X[i],w) + b)          #(n,)(n,)=scalar\n",
    "        err_i  = f_wb_i  - y[i]                       #scalar\n",
    "        for j in range(n):\n",
    "            dj_dw[j] = dj_dw[j] + err_i * X[i,j]      #scalar\n",
    "        dj_db = dj_db + err_i\n",
    "    dj_dw = dj_dw/m                                   #(n,)\n",
    "    dj_db = dj_db/m                                   #scalar\n",
    "        \n",
    "    return dj_db, dj_dw  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c4e5f34-fd61-4640-b0ee-0f6c6641012d",
   "metadata": {},
   "source": [
    "Check the implementation of the gradient function using the cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10deeb4d-bdab-4a9f-aa41-49bab8f3156e",
   "metadata": {},
   "outputs": [],
   "source": [
    "X_tmp = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])\n",
    "y_tmp = np.array([0, 0, 0, 1, 1, 1])\n",
    "w_tmp = np.array([2.,3.])\n",
    "b_tmp = 1.\n",
    "dj_db_tmp, dj_dw_tmp = compute_gradient_logistic(X_tmp, y_tmp, w_tmp, b_tmp)\n",
    "print(f\"dj_db: {dj_db_tmp}\" )\n",
    "print(f\"dj_dw: {dj_dw_tmp.tolist()}\" )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98f0b5af-8ecf-412d-9e8b-437ed22631fb",
   "metadata": {},
   "source": [
    "**Expected output**\n",
    "``` \n",
    "dj_db: 0.49861806546328574\n",
    "dj_dw: [0.498333393278696, 0.49883942983996693]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6e58c2b-e524-4e14-ad68-542f56ab2e97",
   "metadata": {},
   "source": [
    "#### Gradient Descent Code \n",
    "The code implementing equation (1) above is implemented below. Take a moment to locate and compare the functions in the routine to the equations above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf185b79-9d84-423a-a4d9-5e0d188f0643",
   "metadata": {},
   "outputs": [],
   "source": [
    "def gradient_descent(X, y, w_in, b_in, alpha, num_iters): \n",
    "    \"\"\"\n",
    "    Performs batch gradient descent\n",
    "    \n",
    "    Args:\n",
    "      X (ndarray (m,n)   : Data, m examples with n features\n",
    "      y (ndarray (m,))   : target values\n",
    "      w_in (ndarray (n,)): Initial values of model parameters  \n",
    "      b_in (scalar)      : Initial values of model parameter\n",
    "      alpha (float)      : Learning rate\n",
    "      num_iters (scalar) : number of iterations to run gradient descent\n",
    "      \n",
    "    Returns:\n",
    "      w (ndarray (n,))   : Updated values of parameters\n",
    "      b (scalar)         : Updated value of parameter \n",
    "    \"\"\"\n",
    "    # An array to store cost J and w's at each iteration primarily for graphing later\n",
    "    J_history = []\n",
    "    w = copy.deepcopy(w_in)  #avoid modifying global w within function\n",
    "    b = b_in\n",
    "    \n",
    "    for i in range(num_iters):\n",
    "        # Calculate the gradient and update the parameters\n",
    "        dj_db, dj_dw = compute_gradient_logistic(X, y, w, b)   \n",
    "\n",
    "        # Update Parameters using w, b, alpha and gradient\n",
    "        w = w - alpha * dj_dw               \n",
    "        b = b - alpha * dj_db               \n",
    "      \n",
    "        # Save cost J at each iteration\n",
    "        if i<100000:      # prevent resource exhaustion \n",
    "            J_history.append( compute_cost_logistic(X, y, w, b) )\n",
    "\n",
    "        # Print cost every at intervals 10 times or as many iterations if < 10\n",
    "        if i% math.ceil(num_iters / 10) == 0:\n",
    "            print(f\"Iteration {i:4d}: Cost {J_history[-1]}   \")\n",
    "        \n",
    "    return w, b, J_history         #return final w,b and J history for graphing\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "135780cf-214a-4d5e-9c11-28eabe9dc42f",
   "metadata": {},
   "source": [
    "Let's run gradient descent on our data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "52815246-1d10-4fad-9a87-eeb3a58f0176",
   "metadata": {},
   "outputs": [],
   "source": [
    "w_tmp  = np.zeros_like(X_train[0])\n",
    "b_tmp  = 0.\n",
    "alph = 0.1\n",
    "iters = 10000\n",
    "\n",
    "w_out, b_out, _ = gradient_descent(X_train, y_train, w_tmp, b_tmp, alph, iters) \n",
    "print(f\"\\nupdated parameters: w:{w_out}, b:{b_out}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3466baef-bfc5-418c-a903-0a80c843ef61",
   "metadata": {},
   "source": [
    "#### Let's plot the results of gradient descent:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d1d946da-f5a2-404b-aaf3-ad9306581e74",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig,ax = plt.subplots(1,1,figsize=(5,4))\n",
    "# plot the probability \n",
    "plt_prob(ax, w_out, b_out)\n",
    "\n",
    "# Plot the original data\n",
    "ax.set_ylabel(r'$x_1$')\n",
    "ax.set_xlabel(r'$x_0$')   \n",
    "ax.axis([0, 4, 0, 3.5])\n",
    "plot_data(X_train,y_train,ax)\n",
    "\n",
    "# Plot the decision boundary\n",
    "x0 = -b_out/w_out[1]\n",
    "x1 = -b_out/w_out[0]\n",
    "ax.plot([0,x0],[x1,0], c=dlc[\"dlblue\"], lw=1)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2af4d30-b49d-4062-8820-87558039cbc8",
   "metadata": {},
   "source": [
    "In the plot above:\n",
    " - the shading reflects the probability y=1 (result prior to decision boundary)\n",
    " - the decision boundary is the line at which the probability = 0.5\n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d17c626-93e8-49f5-ba04-0f70cad94174",
   "metadata": {},
   "source": [
    "## Another Data set\n",
    "Let's return to a one-variable data set. With just two parameters, $w$, $b$, it is possible to plot the cost function using a contour plot to get a better idea of what gradient descent is up to."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "704c2a67-3e63-4b2b-90d8-170705920e2c",
   "metadata": {},
   "outputs": [],
   "source": [
    "x_train = np.array([0., 1, 2, 3, 4, 5])\n",
    "y_train = np.array([0,  0, 0, 1, 1, 1])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e8757f1-af25-4ec8-87e3-13c285ee80fb",
   "metadata": {},
   "source": [
    "As before, we'll use a helper function to plot this data. The data points with label $y=1$ are shown as red crosses, while the data points with label $y=0$ are shown as black circles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ed37764e-ff84-44df-beca-12d83803782d",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig,ax = plt.subplots(1,1,figsize=(4,3))\n",
    "plt_tumor_data(x_train, y_train, ax)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4151740-e04f-4705-a9bb-5529811f58f6",
   "metadata": {},
   "source": [
    "In the plot below, try:\n",
    "- changing $w$ and $b$ by clicking within the contour plot on the upper right.\n",
    "    - changes may take a second or two\n",
    "    - note the changing value of cost on the upper left plot.\n",
    "    - note the cost is accumulated by a loss on each example (vertical dotted lines)\n",
    "- run gradient descent by clicking the orange button.\n",
    "    - note the steadily decreasing cost (contour and cost plot are in log(cost) \n",
    "    - clicking in the contour plot will reset the model for a new run\n",
    "- to reset the plot, rerun the cell"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "053a8687-9426-4468-b52e-9a68a81e5b4f",
   "metadata": {},
   "outputs": [],
   "source": [
    "w_range = np.array([-1, 7])\n",
    "b_range = np.array([1, -14])\n",
    "quad = plt_quad_logistic( x_train, y_train, w_range, b_range )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab3a295b-682e-4d6d-b242-93d0514b0e63",
   "metadata": {},
   "source": [
    "## Congratulations!\n",
    "You have:\n",
    "- examined the formulas and implementation of calculating the gradient for logistic regression\n",
    "- utilized those routines in\n",
    "    - exploring a single variable data set\n",
    "    - exploring a two-variable data set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "085a7f94-d988-4e2d-94ee-57888c302d77",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
